{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这一节聊一下HMM与CRF之间的区别与联系，内容主要参考自李宏毅老师的视频：[《HMM/CRF by 李宏毅》](https://www.bilibili.com/video/BV1zJ411575b?from=search&seid=11892221500659184153)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 一.联系\n",
    "\n",
    "先看看它们有什么相同的地方，我们知道HMM模型可以表示如下：   \n",
    "\n",
    "$$\n",
    "P(x,y)=\\prod_{i=1}^np(y_{i}\\mid y_{i-1})p(x_i\\mid y_i)\n",
    "$$   \n",
    "\n",
    "这里，起始点看做$p(y_1\\mid y_0)=p(y_1)$，而CRF模型可以写作：   \n",
    "\n",
    "$$\n",
    "P(y\\mid x)=\\frac{exp(w^TF(y,x))}{Z(x)}\n",
    "$$  \n",
    "\n",
    "似乎看起来风马牛不相及，但如果HMM做一点变换，它也可以写作CRF的形式，假设$x_i,i=1,2,..,n$可能取值的情况有$N$种：$o_1,o_2,...,o_N$，而$y_i,i=1,2,...,n$可能取值的情况有$M$种：$q_1,q_2,...,q_M$，那么：   \n",
    "\n",
    "$$\n",
    "P(x,y)=\\prod_{i=1}^np(y_{i}\\mid y_{i-1})p(x_i\\mid y_i)\\\\\n",
    "=\\prod_{i=1}^n\\prod_{k_1=1}^M\\prod_{k_2=1}^M\\prod_{t=1}^Np(y_{i}=q_{k_1}\\mid y_{i-1}=q_{k_2})^{I(y_{i}=q_{k_1} , y_{i-1}=q_{k_2})}p(x_i=o_t\\mid y_i=q_{k_1})^{I(x_i=o_t,y_i=q_{k_1})}\\\\\n",
    "=\\prod_{k_1=1}^M\\prod_{k_2=1}^M\\prod_{t=1}^Np(q_{k_1}\\mid q_{k_2})^{N_1(q_{k_1},q_{k_2})}p(o_t\\mid q_{k_1})^{N_2(o_t,q_{k_1})}\\\\\n",
    "=exp(\\sum_{k_1=1}^M\\sum_{k_2=1}^MN_1(q_{k_1},q_{k_2})logp(q_{k_1}\\mid q_{k_2})+\\sum_{k_1=1}^M\\sum_{t=1}^NN_2(o_t,q_{k_1})logp(o_t\\mid q_{k_1}))\\\\\n",
    "=exp(logp(q_1\\mid q_1)\\cdot N_1(q_1,q_2)+\\cdots+logp(q_M\\mid q_M)\\cdot N_1(q_M,q_M)+logp(o_1,\\mid q_1)\\cdot N_2(o_1,q_2)+\\cdots logp(o_N\\mid q_M)\\cdot N_2(o_N,q_M) )\\\\\n",
    "=exp([logp(q_1\\mid q_1),...,logp(o_N\\mid q_M)][N_1(q_1,q_2),...,N_2(o_N,q_M)]^T)\\\\\n",
    "=exp(w^TF(y,x))\n",
    "$$   \n",
    "\n",
    "这里，$I(\\cdot)$表示指示函数，$I(x_i=o_t,y_i=q_{k_1})$表示如果$x_i=o_t$且$y_i=q_{k_1}$，那么返回结果1，否则返回结果0，$I(y_{i}=q_{k_1},y_{i-1}=q_{k_2})$表示，如果$y_{i}=q_{k_1}$且$y_i=q_{k_2}$那么返回结果1，否则返回结果0，而：   \n",
    "\n",
    "$$\n",
    "N_1(q_{k_1},q_{k_2})=\\sum_{i=1}^nI(y_{i}=q_{k_1},y_{i-1}=q_{k_2})\\\\\n",
    "N_2(o_t,q_{k_1})=\\sum_{i=1}^nI(x_i=o_t,y_i=q_{k_1})\n",
    "$$   \n",
    "\n",
    "显然，最后有：   \n",
    "\n",
    "$$\n",
    "w=[logp(q_1\\mid q_1),...,logp(o_N\\mid q_M)]^T\\\\\n",
    "F(y,x)=[N_1(q_1,q_2),...,N_2(o_N,q_M)]^T\n",
    "$$  \n",
    "\n",
    "大家看一下，这是不是就和CRF一一对应了呢，上面函数$I(\\cdot)$对应了特征函数$f_k(y_{i-1},y_i,x,i)$，$N1(\\cdot),N_2(\\cdot)$对应了特征函数出现的次数的累计$f_k(y,x)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 二.区别\n",
    "虽然它们具有一样的表现形式，但是训练它们的“驱动力”不一样，HMM是最大化的联合概率分布的似然估计，而CRF是最大化的条件概率分布的似然估计，虽然它们都在最大化已观测数据的概率：   \n",
    "\n",
    "（1）但是，CRF对于未观测到的模式具有一定抑制的作用（对于$Z(x)$，有缩小它的趋势，而$Z(x)$中包含了我们未能观测数据的概率），所以，如果数据量足够可以考虑采用CRF（数据量太少，那些未出现的“好模式”也会被抑制）   \n",
    "\n",
    "（2）那么，反过来，HMM对于未知模式就具有一定的“创造性”了，首先它对于未知模式没有抑制作用，其次，由于条件独立性假设，它还会提高某些未知模式的概率，例如，训练集中只有两种模式“A->B-D”和“H->B->C”,它们的训练样本各占一半，那么对于“A->B->C”这种从未出现过的模式，它也能给出50%的概率值，但是这也是它的弊端，因为对于未知模式，也包含了“坏模式”"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 小结一下\n",
    "\n",
    "（1）HMM和CRF对于已观测数据都起着积极作用，即提升这部分数据出现的概率值；   \n",
    "\n",
    "（2）对于未观测数据，CRF对于“好坏模式”都会抑制，而HMM对于“好坏模式”都会提升，所以对于算法学习来说两者不分伯仲，难分好坏； **但是呢**，目前可用于训练的数据量往往很大，而且逐年累增，所以CRF可能更胜一筹"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
